Skip to content

Conversation

@rlavaee
Copy link
Contributor

@rlavaee rlavaee commented Nov 17, 2025

This is the first PR to enable the prefetch optimization via Propeller based on our RFC. It enables emitting special symbols prefixed with __llvm_prefetch_target to point to the prefetch targets as specified via directives in the profile. A prefetch target is uniquely identified by its function name, basic block ID, and the subblock index (used when the target is after a call instruction).

A new pass is added which sets a field in basic blocks which have prefetch targets. The next PR will add the prefetch insertion logic into the same pass.

@github-actions
Copy link

github-actions bot commented Nov 17, 2025

🐧 Linux x64 Test Results

  • 186408 tests passed
  • 4858 tests skipped

// the block if `SubblockIndex` is the last call in the block).
struct SubblockID {
UniqueBBID BBID;
unsigned SubblockIndex;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the callsite index within the block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had originally named this callsiteIndex, but it will be a misnomer if there are not calls in the block (where we will use the index 0). What do u suggest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can define two 'pseudo calls' one at the beginning of a block and one at the end, so the first real call always get id == 1.

Besides we may want to differentiate between the code point before and after the call, so perhaps we can use negative sign for it. For instance t 2,-1 means the target is at block 2 before the first call, while t 2,1 means the target is in block 2 after the first call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CallSiteIndex seems more readable to me. It is fine that when it is zero it means the start of the block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefetch targets and instructions are only placed at the beginning of a subblock (either the beginning of the block , or after calls). So we don't need the negativity.

we can define two 'pseudo calls' one at the beginning of a block and one at the end, so the first real call always get id == 1.

No need for a pseudo call at the end, but with one at the beginning, we get subBlockIndex+1. I can change it back to int callsiteIndex and have callsiteIndex == -1 refer to the prefetch target being mapped to the beginning of the block. WDYT about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I went back and changed it to unsigned 1-based callsiteIndexes. I explain in comments that callsite index of zero means the start of the block.

DenseMap<UniqueBBID, SmallVector<unsigned>> PrefetchTargetsByBBID;
for (const auto &Target : PrefetchTargets)
PrefetchTargetsByBBID[Target.BBID].push_back(Target.SubblockIndex);
for (auto &MBB : MF) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible to iterate the BBIDs and use MF->getBlockNumbered(ID) method to get MBB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't store a map from BBIDs to MBBs. So there is no way around once iterating over MBBs if that's what you mean (We can create the map here but it would still require an iteration).

;;
;; Specify the bb sections profile:
; RUN: echo 'v1' > %t
; RUN: echo 'f _Z3foob' >> %t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format is different from what is described in RFC which uses , instead of @

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I changed it to use comma. Alternatively, we can use spaces which will make the parsing code simpler.

@WenleiHe WenleiHe requested review from MatzeB and wlei-llvm November 17, 2025 23:37
@MatzeB
Copy link
Contributor

MatzeB commented Nov 18, 2025

Is this usable for anyone outside of google? I'm not fluent in the propeller codebase, so possible I overlooked something, but I don't see any code dealing with instruction prefetching in the propeller code at https://github.com/google/llvm-propeller (main branch)...

@rlavaee
Copy link
Contributor Author

rlavaee commented Nov 18, 2025

Is this usable for anyone outside of google? I'm not fluent in the propeller codebase, so possible I overlooked something, but I don't see any code dealing with instruction prefetching in the propeller code at https://github.com/google/llvm-propeller (main branch)...

The feature was added just today to the main branch: google/llvm-propeller@f36e59d

We have also discussed how this pass can be reused in other frameworks in the RFC; TLDR is we'll get there when we get there. Currently, we rely on BB_ADDR_MAP to do precise mapping to subblocks.

Copy link
Member

@tmsri tmsri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially reviewed.

// the block if `SubblockIndex` is the last call in the block).
struct SubblockID {
UniqueBBID BBID;
unsigned SubblockIndex;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CallSiteIndex seems more readable to me. It is fine that when it is zero it means the start of the block.

// the edge a -> b (a is not cloned). The index of the path in this vector
// determines the `UniqueBBID::CloneID` of the cloned blocks in that path.
SmallVector<SmallVector<unsigned>> ClonePaths;
// Code prefetch targets, specified by the subblock ID of which beginning must
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest rephrasing this comment to something like: "Code Prefetch Target list, the beginning of each Subblock is the target".

/// `i`th call (or the beginning of the block if `i==0`) to before the`i+1`th
/// callsite (or the end of the block). The prefetch target is always the
/// beginning of the subblock.
SmallVector<unsigned> PrefetchTargetSubblockIndexes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define CallSiteIndex or SubblockIndex as unsigned type and use that instead of unsigned?

};

for (auto &MI : MBB) {
EmitPrefetchTargetSymbolIfNeeded();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have EmitPrefetchTargetIt take a parameter which is the unsigned callsiteindex. I would remove the updates to the PrefetchTargets from the lambda and do it within the loop and replace this line with:

if (PrefetchTargetIt != PrefetchTargets.end() &&
     *PrefetchTargetIt == LastCallSiteIndex) {
  EmitPrefetchTargetSymbolIfNeeded(*PrefetchTargetIt);
  ++PrefetchTargetIt;
}

// version, in which case the prefetch targets should also be replaced.
OutStreamer->emitSymbolAttribute(
PrefetchTargetSymbol,
MF->getFunction().isWeakForLinker() ? MCSA_Weak : MCSA_Global);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to internal linkage? Can you make this symbol a global, that would conflict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. We are relying on funique as mentioned in the RFC. Should we add a check here to ensure funique has been used (it's possible that funique is not applied on some functions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants